Skip to content
This repository has been archived by the owner on Mar 1, 2024. It is now read-only.

Utilize Polars.DataFrame for performance in ModelbitComponent #80

Merged
merged 22 commits into from
Oct 16, 2023

Conversation

ykeremy
Copy link
Contributor

@ykeremy ykeremy commented Oct 4, 2023

  • Does this PR have impact on local development experience? If yes, make sure you have a plan and add the documentations to address issues that come with the change
  • bump version
  • make a release
  • publish to pypi service

@wintonzheng
Copy link
Contributor

❤️ I love the implementation. got the question about the composite identifier tho

Copy link
Contributor

@wintonzheng wintonzheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

)
df = df.filter(pl.col(feature_name).is_not_null())
if len(df) > 1:
raise WyvernFeatureValueError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a log line here / reason here?

Copy link
Contributor

@wintonzheng wintonzheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comments to address

wyvern/components/features/feature_store.py Outdated Show resolved Hide resolved
wyvern/components/features/feature_retrieval_pipeline.py Outdated Show resolved Hide resolved
A DataFrame that contains all the real-time features.
"""
grouped_features = defaultdict(list)
for key, value in real_time_feature_dfs:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a dict comprehension?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is n = small (5)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question.

I don't think we can just do a dict comprehension because we're appending to a list since it's acceptable to have collisions.

However, n isn't necessarily a small number either. I'd say let's just see how this performs first. I believe the ideal upgrade here might be changing how the real time features are calculated.

@ykeremy ykeremy merged commit 3c4a9e6 into main Oct 16, 2023
2 checks passed
@ykeremy ykeremy deleted the ykeremy/polars-perf-upgrade branch October 16, 2023 20:20
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants